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Abstract 

Stochastic differential equations (SDEs) provide a natural framework for modelling intrinsic 
stochasticity inherent in many continuous-time physical processes. When such processes are ob¬ 
served in multiple individuals or experimental units, SDE driven mixed-effects models allow the 
quantification of both between and within individual variation. Performing Bayesian inference 
for such models using discrete-time data that may be incomplete and subject to measurement 
error is a challenging problem and is the focus of this paper. We extend a recently proposed 
MCMC scheme to include the SDE driven mixed-effects framework. Fundamental to our ap¬ 
proach is the development of a novel construct that allows for efficient sampling of conditioned 
SDEs that may exhibit nonlinear dynamics between observation times. We apply the resulting 
scheme to synthetic data generated from a simple SDE model of orange tree growth, and real 
data on aphid numbers recorded under a variety of different treatment regimes. In addition, we 
provide a systematic comparison of our approach with an inference scheme based on a tractable 
approximation of the SDE, that is, the linear noise approximation. 

Keywords: Stochastic differential equation; mixed-effects; Markov chain Monte Carlo; modified 
innovation scheme; linear noise approximation. 


1 Introduction 

Diffusion processes satisfying Ito stochastic differential equations (SDEs) are a class of continuous¬ 
time, continuous-valued Markov stochastic processes that can be used to model a wide range of 
phenomena. Examples include (but are not limited to) epidemics, financial time series, population 
dynamics (including predator-prey systems) and intra-cellular processes. When repeated measure¬ 
ments on a system of interest are made, differences between individuals or experimental units can 
be incorporated through random effects. Quantification of both system (intrinsic) variation and 
variation between units leads to a stochastic differential mixed-effects model (SDMEM). 

Unfortunately, analytic intractability of SDEs governing most nonlinear multivariate diffusions 
can make likelihood-based inference methods problematic. Methods to overcome this difficulty 
include closed-form expansion of the transition density (Ait-Sahalia, 2002, 2008; Stramer et ah, 
2010), exact simulation approaches (Beskos et ah, 2006; Sermaidis et ah, 2013) and use of the 
Euler-Maruyama approximation coupled with data augmentation (Pedersen, 1995; Elerian et ah, 
2001; Eraker, 2001; Durham and Gallant, 2002; Golightly and Wilkinson, 2008; Stramer and Bognar, 
2011; Kou et ah, 2012). Difficulty in performing inference for SDEs has resulted in relatively little 
work on SDMEMs. 
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Picchini et al. (2010) propose a procedure for obtaining approximate maximum likelihood es¬ 
timates for SDMEM parameters based on a two step approach; they use a closed-form Hermite 
expansion (Ai't-Sahalia, 2002, 2008) to approximate the transition density, before using Gaussian 
quadrature to numerically integrate the conditional likelihood with respect to the random param¬ 
eters. As noted by Picchini and Ditlevsen (2011), the approach is, in practice, limited to a scalar 
random effect parameter since Gaussian quadrature is computationally inefficient when the dimen¬ 
sion of the random effect parameter grows. The methodology is extended in Picchini and Ditlevsen 
(2011) to deal with multiple random effects. A number of limitations remain however. In par¬ 
ticular a reducible diffusion process is required, that is, one which can be transformed to give a 
unit diffusion coefficient. Another drawback is that the method cannot account for measurement 
error. A promising approach appears to be the use of the extended Kalman hlter (EKF) to provide 
a tractable approximation to the SDMEM. This has been the focus of Overgaard et al. (2005), 
Tornpe et al. (2005) and Berglund et al. (2011). The R package PSM (Klim et ah, 2009) uses the 
EKF to estimate SDMEMs. Unfortunately, a quantihcation of the effect of using these approximate 
inferential models appears to be missing from the literature. Donnet et al. (2010) discuss inference 
for SDMEMs in a Bayesian framework, and implement a Gibbs sampler when the SDE (for each 
experimental unit) has an explicit solution. When no explicit solution exists they suggest that a 
solution might be found using the Euler-Maruyama discretisation. 

1.1 Contributions and organisation of the paper 

In this article we provide a method that permits (simulation-based) Bayesian inference for a large 
class of multivariate SDMEMs using discrete-time observations that may be incomplete (so that 
only a subset of model components are observed) and subject to measurement error. The method 
makes use of a novel scheme that allows for observations made sparsely in time, as the process of 
interest may exhibit nonlinear dynamics between measurement times. 

As a starting point, we consider a data augmentation approach that adopts an Euler-Maruyama 
approximation of unavailable transition densities and augments low frequency data with additional 
time points over which the approximation is satisfactory. Although a discretisation bias is in¬ 
troduced, this can be made arbitrarily small (at greater computational expense). Moreover, the 
approach is flexible, and is not restricted to reducible diffusions. A Bayesian approach then aims 
to construct the joint posterior density for parameters and the components of the latent process. 
The intractability of the posterior density necessitates simulation techniques such as Markov chain 
Monte Carlo. As is well documented in Roberts and Stramer (2001), care must be taken in the 
design of the MCMC sampler due to dependence between the parameters entering the diffusion 
coefficient and the latent process. We therefore adapt the reparameterisation technique (known as 
the modihed innovation scheme) of Golightly and Wilkinson (2008) and Golightly and Wilkinson 
(2010) (see also Stramer and Bognar (2011); Fuchs (2013); Papaspiliopoulos et al. (2013)) to the 
SDMEM framework. A key requirement of the scheme is the ability to sample the latent process 
between two hxed values. Previous approaches have typically focused on the modihed diffusion 
bridge construct of Durham and Gallant (2002). For the SDMEM considered in Section 5.2 we hnd 
that this construct fails to capture the nonlinear dynamics exhibited between observation times. We 
therefore develop a novel bridge construct that is simple to implement and can capture nonlinear 
behaviour. 

Finally, we provide a systematic comparison of our approach with an inference scheme based 
on a linear noise approximation (LNA) of the SDE. The LNA approximates transition densities 
as Gaussian, and when combined with Gaussian measurement error, allows the latent process 
to be integrated out analytically. Essentially a forward (Kalman) hlter can be implemented to 
calculate the marginal likelihood of all parameters of interest, allowing a marginal Metropolis- 
Bastings scheme targeting their posterior distribution. It should be noted, however, that evaluation 
of the Gaussian transition densities under the LNA require the solution of an ordinary differential 


2 


equation (ODE) system whose order grows quadratically with the number of components (say d) 
governed by the SDE. The computational efficiency of an LNA based inference scheme will therefore 
depend on d, and on whether or not the ODE system can be solved analytically. 

We apply the methods to two examples. Eirst, we consider a synthetic dataset generated from 
an SDMEM driven by the simple univariate model of orange tree growth presented in Picchini 
et al. (2010) and Picchini and Ditlevsen (2011). The ODE system governing the LNA solution is 
tractable in this example. Second, we fit a model of aphid growth to both real and synthetic data. 
The real data are taken from Matis et al. (2008) and consist of Cotton aphid [Aphis gossypii) counts 
in the Texas High Plains obtained for three different levels of irrigation water, nitrogen fertiliser 
and block. This application is particularly challenging, due to the nonlinear drift and diffusion 
coefficients governing the SDMEM and the ability to only observe one of the model components 
(with error). Moreover, the ODE system governing the LNA solution is intractable and a numerical 
solver must be used. 

The remainder of the article is organised as follows. The SDMEM framework is introduced in 
Section 2. Section 3 provides MCMC methods for Bayesian inference, with a novel bridge construct 
outlined in Section 3.2. The linear noise approximation and its application as an inferential model 
is discussed in Section 4. The methods are applied in Section 5 before conclusions are drawn in 
Section 6. 

2 Stochastic Differential Mixed-effects models 

Consider the case where we have N experimental units randomly chosen from a theoretical popu¬ 
lation, and associated with each unit f is a continuous-time d-dimensional Ito process {XI, t > 0} 
governed by the SDE 

dXi = a[Xl 9, 6*) dt + yJfi[Xi, 6, ¥) dWl, X* = x*, i = (1) 

Here, a is a d-vector of drift functions, the diffusion coefficient /3 is a d x d positive definite matrix 
with a square root representation such that = /3 and W/ is a d-vector of (uncorrelated) 

standard Brownian motion processes. The p-vector parameter 6 = (di,..., dp)^ is common to all 
units whereas the g-vectors 6* = [h\, ..., 6* )^, i = 1,... ,N, are unit-specific effects, which may 
be fixed or random. In the most general random effects scenario we let 7r(6*|^/)) denote the joint 
distribution of 6*, parameterised by the r-vector il) = [ipi,... ,'ijjr)'^. The model defined by (1) 
allows for differences between experimental units through different realisations of the Brownian 
motion paths VP/ and the random effects 6*, accounting for inherent stochasticity within a unit, 
and variation between experimental units respectively. 

We assume that each experimental unit {XI, t > 0} cannot be observed exactly, but observations 
d* = (dig, dq) • • •) dt„)^ are available and these are conditionally independent (given the latent 
process). We link the observations to the latent process via 

Y; = F^Xi + et, N[0,Y), (2) 

where 1)® is a do-vector, E is a constant dx do matrix and et is a random do-vector. Note that this 
setup allows for only observing a subset of components (do < d) and this aspect is explored further 
in Section 5.2. 

Together (1) and (2) completely specify the stochastic differential mixed-effects model. However, 
for most problems of interest the form of the SDE associated with each unit will not permit an 
analytic solution, precluding straightforward inference for the unknown parameters. We therefore 
work with the Euler-Maruyama approximation 

XXI = -xi = a[Xl, e, F) At + ^i3[xi,e,F) AWi 
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where ISWl ~ N{0, laAt) and At is the length of time between observations, assumed equally 
spaced for notational simplicity. It is, of course, unlikely that this approximation will be sufficiently 
accurate over the intervals between observation times and so we adopt a data augmentation scheme. 
Partitioning [tj,tj+i] as 


tj — 7j',0 ^ ^ < . . . < Tj,m—1 ^ '^j,m — ^i+1 


introduces m — 1 intermediate time points with interval widths of length 


At = Tj^k+i - Tj^k = — -■ (3) 

m 

The Euler-Maruyama approximation can then be applied over each interval of width At, and the 
associated discretisation bias can be made arbitrarily small at the expense of having to impute 
{XI} at the intermediate times. We adopt the shorthand notation 


1b,b+i] 


= X 


[id+i] 


= [x. 


y,o’ 




for the latent process over the time interval for unit i. Hence, the complete latent trajectory 

associated with unit i is given by 


and we stack all unit-specific trajectories into a matrix x = (x^,..., x'^). Likewise the matrix 
y = (y^,..., y^) denotes the entire set of observations. Next we focus on how to perform Bayesian 
inference for the model quantities x, 9, b = ..., b^)"^, -0 and S. 


3 Bayesian inference 

The joint posterior for the common parameters 0, fixed/random effects b, hyperparameters ip, 
measurement error variance S and latent values x is given by 


tt( 9, Ip, S, b, x\y) oc 7r{9)Tr{ip)Tr{T.)Tr{b\ip)Tr{x\9, b)Tr{y\x, S) (4) 

where Tr{9)Tr(ip)Tr{Ti) is the joint prior density ascribed to 9, ip and S. In addition we have that 

N n—1 m 


7T 


{x\9, &)=n n n 

i=l j=0 k=l 


(5) 


where 


= N ; x;^. + a(x;,^^_^ 9, 6*)At, ^{ x }. ,_^, 9 , 5*)Ar^ 

and N{- ] m,V) denotes the multivariate Gaussian density with mean m and variance V. Similarly 


N n 


TT y X, 2. = 


7=1 j=0 


where TT{y\.\x\.,Ti) = N{yl. ; x\^,Ti). Given the intractability of the joint posterior distribution in 
(4) we aim to construct a Markov chain Monte Carlo (MCMC) scheme which generates realisations 
from this posterior. The form of the SDMEM admits a Gibbs sampling strategy with blocking that 
sequentially takes draws from the full conditionals 
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1. 7r(x|6»,V’,S,6,?/) = 7r(x|6»,S,6,y), 
3. 7r(6'|V’,S,6,x,y) = 7r(6l|6,x), 

5. 7r(V’|6',S,6,x,y) = 7r(V^|6). 


2. 7r(S|6»,V’,&,x,y) = 7r(S|x,?/), 
4. 7r(6|6',V’,S,x,y) = 7r(6|6',V^,x) 


Further blocking strategies that exploit the conditional dependencies between the model param¬ 
eters and latent trajectories can be used. For example, in step 1 the latent trajectories can be 
updated separately for each experimental unit. Likewise, the unit-specific random effects can be 
updated separately. Where necessary, Metropolis-within-Gibbs updates can be used. We note that 
as written, this scheme will mix intolerably poorly as the degree of augmentation m is increased 
due to dependence between the latent values x and the parameters entering the diffusion coefficient 
(namely 9 and h). We refer the reader to Roberts and Stramer (2001) for a detailed discussion 
of this problem. A simple mechanism for overcoming this issue is to update the parameters and 
latent trajectories jointly (and this has been considered for SDE models by Stramer and Bognar 
(2011) and Golightly and Wilkinson (2011)). For SDMEMs a joint update of 0, b and x is likely to 
result in a sampler with low acceptance rates. We therefore wish to preserve the blocking structure 
described above and instead adapt the reparameterisation of Golightly and Wilkinson (2008) to 
our problem. In what follows, we describe in detail each step of the Gibbs sampler. 

3.1 Path updates 

The full conditional density of the latent paths for all experimental units is given by 

N 

7r(x|0, E, 6, y) oc 7r(x|0, 6)7r(y|x, ^) = ]^ 7r(x*|0,5*)7r(y*|x*, S) 

i=l 

which suggests a scheme where unit-specific paths are updated separately. We now focus on an 
updating scheme for a single path, and drop i from the notation, writing x in place of x* and 
in place of x ]^Since the parameters are fixed throughout this updating step, we also 
drop them from the notation. 

Following Golightly and Wilkinson (2008) we update x in overlapping blocks of size 2m + 1. 
Gonsider times tj and tj +2 at which the current values of the latent process are xtj and xtj_f _2 ■ The 
full conditional density of the latent process over the interval {tj,tj+ 2 ) is given by 

j+l m 
1=3 k=l 

Under the nonlinear structure of the diffusion process, this full conditional is intractable and so we 
use a Metropolis-Hastings step to generate draws from (6). We use an independence sampler with 
proposal density of the form 

’ ^L+2) ) ^ij+i) ^^(^(j-i-i j-i-2) ) ^tj+2)• ('^) 

Figure 1 gives an illustration of the updating procedure which can be applied over intervals {tj,tj+ 2 ), 
j = 0, l,...,n — 2, with two additional Metropolis-Hastings steps (such as those described in 
Golightly and Wilkinson (2006)) that allow for updating x at times to and tn- Deriving appropriate 
forms for qi and q 2 requires the ability to (approximately) generate a discrete-time realisation of a 
diffusion process between two time points at which the process is either observed exactly or subject 
to Gaussian noise. The resulting trajectory is typically referred to as a diffusion bridge. 

Several strategies for constructing diffusion bridges have been proposed in the literature. For 
example, Pedersen (1995) used the Euler-Maruyama scheme to generate bridges myopically of the 
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Propose using qi 


Propose using <72 


• • • 

ytj ytj+i ytj+2 

Figure 1: Path update illustration over a block of size 2m + 1. 


end point. Durham and Gallant (2002) use a linear Gaussian approximation of the distribution of 
the process conditional on the value at a previous and future time point, giving a construct known as 
the modified diffusion bridge. Extensions of this construct to the case of partial observation with 
additive Gaussian noise can be found in Golightly and Wilkinson (2008). Whilst this construct 
can, in principle, be applied to arbitrary nonlinear multivariate diffusion processes, the effect of 
the Gaussian approximation is to guide the bridge towards the observation in a linear way, unless 
there is large uncertainty in the observation process. This effect is exacerbated in the case of 
no measurement error, in which case the resulting construct is independent of the drift of the 
target process. Consequently, use of the modified diffusion bridge as a proposal mechanism (in a 
Metropolis-Hastings independence sampler) is likely to result in low acceptance rates, unless the 
drift is of little importance in dictating the dynamics of the target process between observation 
times. Several attempts to overcome this issue have been proposed in the recent literature. 

A time-dependent combination of the Pedersen and modified diffusion bridge approaches was 
proposed by Lindstrom (2012). However, the resulting construct requires a model specific tuning 
parameter governing the relative weight of each contribution (either Pedersen or modified diffusion 
bridge). Moreover, the optimal value (in terms of maximising acceptance rate) may vary between 
observation intervals. Beskos et al. (2013) use Hybrid Monte Carlo (HMC) on pathspace to generate 
SDE sample paths under various observation regimes. Eor the applications considered, the authors 
found reasonable gains in overall efficiency (as measured by minimum effective sample size per CPU 
time) over an independence sampler with a Brownian bridge proposal. However, we note that HMC 
also requires careful choice of tuning parameters (namely the number of steps (and their size) in 
the leapfrog integrator) to maximise efficiency. Schauer et al. (2014) (see also Papaspiliopoulos and 
Roberts (2012)) combine the ideas of Delyon and Hu (2006) and Clark (1990) to obtain a bridge 
based on the addition of a guiding term to the drift of the target SDE. The guiding term requires a 
tractable approximation of the unavailable transition densities governing the target process over the 
length of the inter-observation interval. Schauer et al. (2014) suggest using the transition densities 
associated with a class of linear processes, although we note that finding an approximation that 
is both accurate and computationally efficient may be difficult in practice. Moreover, such an 
approximation can suffer from computational efficiency due to the fact that it must be obtained at 
each intermediate time point. 

In the next section we describe a novel bridge construct that requires no tuning parameters, is 
simple to implement (even when only a subset of components are observed with Gaussian noise), 
computationally efficient and explicitly allows for the effect of the drift governing the target SDE. 

3.2 An improved bridge construct 

Consider a typical interval partitioned into m sub-intervals as in (3), over which we wish 

to generate a realisation of {W, t G [tj,tj+i]} conditional on xt^ = Xr^ q noisy measurement 

= Urj m- Our approach builds on the modified diffusion bridge of Durham and Gallant (2002), 
which we briefly review before describing our extension. 
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3.2.1 Modified diffusion bridge 


Key to constructing the modified diffusion bridge is an approximation of the joint distribution of 
Xr-f. and (conditional on Xr Under the Euler-Maruyama approximation Xr- f,\Xr. 

and lij+i l^r k Gaussian, with the expressions for the mean and variance of the latter evaluated 
at Xt-. to give a linear Gaussian structure. This leads to the approximation 




i. 1 ~ 


N 


f +ai,fc-iAT \ 

/ /3j-fc-iAr Pj^k-iFAr \ 1 

VF^/3,-fc_iAr F^f3j,k-iFA- + / 


( 8 ) 


where A“ = tj+i — Tj^k-i and we have used the shorthand notation a{xTj f,_i) = and 

/3(a^r,j,_i) = f3j,k-i- Gonditioning further on ytj+i gives a Gaussian approximation of 
TT{xr^,^\xrj,^_j^,ytj+i), denoted n{xrj Jxrji^_j^,ytj+i), which can be sampled recursively to give a 
bridge Xr ^, ■ ■ ■, Xr ,^- In the case of no measurement error and observation of all components (so 
that ytj+i = and F = 1^, the d x d identity matrix), we obtain 


n{Xr^ JXrj_,_„Xt^^,) = N 


k 1 


+ 


Xt — Xr^ u 1 

■ _fU-lAr 

tj + l Xj^k—l 


^i+1 U.fc 

tj+l — Xj^k-l 


Pj,k—1 



which is the form of the modified diffusion bridge first described by Durham and Gallant (2002). In 
this case, Tr{xrj f.\xr^ i^_-^,ytj+i) can be seen as a linear approximation of the Brownian bridge SDE 


dXt = f* dt + VPiXt) dWt. (9) 

tj+i — t 

Use of (9) has been justified by Delyon and Hu (2006), who show that the distribution of the 
target process (conditional on xt.^-^) is absolutely continuous with respect to the distribution of the 
solution to (9). We may therefore expect that a Metropolis-Hastings scheme that uses a proposal 
based on a discretisation of (9) will yield a non-zero acceptance rate as At —0 (for a rigorous 
treatment of the limiting forms, we refer the reader to Delyon and Hu (2006), Stramer and Yan 
(2007) and to Papaspiliopoulos and Roberts (2012) for a recent discussion). However, it should 
also be noted that the linear drift function governing (9) is independent of the the drift function 
a(-) governing the target process. Gonsequently, in situations where realisations of the target SDE 
(with the same initial condition) exhibit strong and similar nonlinearity over the inter-observation 
time, the modified diffusion bridge is likely to be unsatisfactory. 


3.2.2 Residual bridge 

To allow explicitly for dynamics based on the drift, we partition Xt into two parts, one that 
accounts for the drift in a deterministic way, and another as a residual stochastic process. The 
modified diffusion bridge is then applied to the residual stochastic process rather than the target 
process itself. The partition we require is 


Xt — rjt F Rti 

where £ [^iUj+i]} is a deterministic process satisfying the ODE 

dry , X 

dt,=xt^, 

and {Rtit G is a residual stochastic process satisfying 

dRt = dXt - dry = {a{Xt) - a{r^t)}dt + ^/^(W) dWf 


( 10 ) 


( 11 ) 


( 12 ) 
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Figure 2: An illustration of the improved bridge construct. Left: The full bridge. Right: A sample 
path of Rt- 


We note that the partition in (10) is used by Fearnhead et al. (2014) (see also Section 4) to derive a 
tractable approximation to the intractable transition densities governing Xt, whereas our primary 
motivation for (10) is the application of the modified diffusion bridge construct to the residual 
process, thus giving a proposal that is likely to perform well for arbitrarily fine discretisations and 
explicitly incorporates the drift of the target SDE. Therefore, we aim to derive an approximation 
j,_i 52 /tj+i), that can be sampled recursively for k = l,...,m and combined with the 
deterministic process (through numerical solution of (11)) via (10) to give a bridge Xrj q, ■ ■ ■, Xrj 
The scheme is illustrated in Figure 2. 

The initial condition r]t = xt- together with the Gaussian measurement error process imply 
that r/. = 0 and 

^3 


iV(0, S). 

Hence, it should be clear that the joint distribution of Rr^ and conditional on 

Rr -= W fc_i can be approximated as 


R. 


'Tj,k 


- F 


rr,. ,_1 ~ N 


/ /?j,fc_iAT Pj^k-iFAr 


where Conditioning further on - F'^r]t-_^_^ gives 


(13) 


where 


?T„ rz7T„ , 77T/ a; 


t^j,k = + (aj.fc-i - aJ,fc_i)AT + ^j^k-iFA t{F^P j^k-iFA + S) 


-1 


X (yi^+i - F - {F + F - a2fc_i)A }) 


(14) 


and 


'^j,k = Pj,k-iAT - Pj^k-iFAT{F'^(3j^k-iFA + S) ^F'^/Sj^k-iAr. 


(15) 







Together (13)-(15) define our bridge construct. These can be used to define the proposal mechanism 
in (7) for generating {Xt,t G [tj,tj+ 2 ]} by taking 

m 

k=l 


and 


k=l 




where — r/rj+i*.kr^+u-.n3^t^+2) be sampled using (10) and (13)-(15) with j replaced 

by j + 1, S = 0 and F = 1^. 

In the special case of no measurement error and observation of all components we have that 


7r(r. 


'^3ik ' ' 1 ’ + l 


+ N (rr,,] rr,k-l + 




At, 


^i+i '^j,k 


— ^ j. _ 

which can be seen as a linear approximation of the Brownian bridge SDE 

dRt = ~ f* + dWf 

tj+i — t 


/3j,fe_iAr 


(16) 


We also note that (16) has the same diffusion coefficient as the target process and appeal again 
to Delyon and Hu (2006), to deduce that the distribution of the residual process governed by (12) 
(conditional on is absolutely continuous with respect to the distribution of the solution to 

(9). 


3.3 Parameter updates 

The full conditional densities of S and V’ are 

7r(S|x, y) oc 7r(S)7r(y|S) and Tr(il!\b) (x 

Often, semi-conjugate priors can be specified for E and negating the need for Metropolis-within- 
Gibbs steps. For the remaining parameters 0 and b = (6^,..., b^Y' have 

N 

T{9\b, x) oc 'k{9)tt{x\ 9, b) and 7r(6|0, 'ijj, x) oc 7r(6|V')7r(x|0, ^) = B 7r(6*|V’)vr(x*|0, F) 

i=l 

where the last expression suggests unit-specific updates of the components of b. 

As discussed earlier, since 9 and the components of b enter into the diffusion coefficient of 
(1), sampling the full conditionals of 9\b,x and b\9,'tp,x as part of a Gibbs sampler will result 
in a reducible Markov chain as m —>■ 00 (or Ar —)• 0). To overcome this problem we use a 
reparameterisation which is outlined in the next section. 


3.3.1 Modified Innovation scheme 

The innovation scheme was first outlined in Ghib et al. (2004) and exploits the fact that, given 
9 and b, under the Euler-Maruyama approximation there is a one-to-one relationship between 
the increments of the process (AAt) and the increments of the driving Brownian motion (AkFt). 
Moreover, whilst the quadratic variation of A determines 9 and b (as m —)• 00 ), the quadratic 
variation of the Brownian process is independent of 9 and b a priori. Conditioning on the Brownian 
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increment innovations in a Gibbs update should therefore be effective in overcoming the dependence 
problem. The resulting algorithm is known as the innovation scheme. Unfortunately, combining an 
updated parameter value with the Brownian increments will not necessarily give an imputed path 
that is consistent with the observations. Therefore, Golightly and Wilkinson (2008, 2010) suggest 
that a diffusion bridge (such as the modified diffusion bridge of Durham and Gallant (2002)) be 
used to determine the innovation process, leading to a modified innovation scheme. 

Fuchs (2013) considers the modified innovation scheme in a continuous-time framework. Adapt¬ 
ing their innovation process to an SDMEM, we have, for an interval [tj, fj+i], an innovation process 
{Zl,t e satisfying 

dZl = (3{Xi, 0, 6 *)- 1/2 (^dXl - d?j , ( 17 ) 

= (3{Xl, 0, 5*)-i/ 2 |a(X^ 0, U) - ^dt + dWl 


with Zl. = 0. Glearly, each process Z* has unit diffusion coefficient and whilst not Brownian motion 
processes, the probability measures induced by each Z* are absolutely continuous with respect to 
Wiener measure. A proof of this result can be found in Fuchs (2013) as well as a justification for 
using this form of innovation process as the effective component in a Gibbs sampler. 

The aim is to apply a discretisation of (17) between observation times. We therefore define 
Xq = (xjg,..., x\J)^ to be the current values of the (unit-specific) latent process at the observation 
times, and stack all xj, values into the matrix Xq- We have for k = 1,... ,m 






- 1/2 


‘ j,k ‘ — l 


^3 + 1 


— X'^ 

Tj^k-1 




At 


where Zr^ o ~ ^ and 


‘'j+1 ' j,k—l 


Note that our discretisation of (17) follows Golightly and Wilkinson (2008) by using the modified dif¬ 
fusion bridge to construct the innovation process. Now define a function / so that X'l, = f{Zl., , 0, 6*) 
and Zl. ^ = f~^{Xl. ^,0,0"^). Let zl^p denote the (unit-specific) innovation values over [to,tn] and 
stack all zl^p values into the matrix Zimp- Define and Ximp similarly. The modified innovation 
scheme samples 0\b, Zimp, Xq and 6*|0, ■0, 0^^, x*, / = 1,..., iV. Note that for an updated value of 
6*, say 6**, a new is updated deterministically through x^J^p = /(00p, 6*,/)**)■ Likewise, for a 
new 0*, a new x*^p is updated deterministically through x-0p = /(00p, 0* ■, f>*), i = 1,... ,N. The 
full conditional density of 0 is 


N n—1 


7r(0|5, Zimp, Xo) oc 7r(0) nn 


2=1 j = l Lk=l 


m—1 




where 


k=l 


- 1/2 


(18) 


j{/(4^.^,0,5*)}= r(x;^.^,0,0) 

is the Jacobian determinant of /. Similarly, the full conditional density of 6*, i = 1,..., A^ is 


n—1 


Tr{b^\0,il,zlmp,xl,) oc 7r(0|0) 


j=i Lfc=i 


m—1 




k=l 


(19) 


Naturally, the full conditionals in (18) and (19) will typically be intractable, requiring the use of 
Metropolis-within-Gibbs updates. 
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4 Linear noise approximation 


In this section we outline a competing solution which uses an inference scheme based on a linear noise 
approximation (LNA) to the SDMEM. The LNA typically refers to an approximation to the solution 
of the forward Kolmogorov equation governing the transition probability of a Markov jump process 
(Kurtz, 1970; Term et ah, 2008; Komorowski et ah, 2009; Finkenstadt et ah, 2013). Specifically, the 
forward Kolmogorov equation is approximated by a Fokker-Planck equation with linear coefficients. 
Equivalently, a general Fokker-Planck equation can be deduced and then linearised. In this context, 
therefore, the LNA aims to replace intractable transition densities with Gaussian approximations. 
In what follows, we give a brief informal derivation of the LNA and refer the reader to Fearnhead 
et al. (2014) and the references therein for further details. 

4.1 Setup 

For notational simplicity and clarity of exposition, we suppress parameter dependence and the 
unit-specific i for the remainder of this sub-section. 

Without loss of generality, consider a time t G [tj,tj^i] at which we wish to approximate the 
intractable transition density associated with Xt\Xt. = xty The LNA uses the same partition of 
Xt given in (10), that is Xt = rjt + Rt where the deterministic process rjt satisfies (11) and the 
residual stochastic process satisfies (12). The key assumption underpinning the LNA is that the 
residual stochastic perturbation is “small” relative to the deterministic process, allowing suitable 
truncation of a Taylor series expansion of a{Xt) and /3(Xt) about ijt- Taking the first two terms in 
the expansion of a{Xt), and the first term in the expansion of P(Xt) gives an SDE satisfied by an 
approximate residual process {Rt,t G [tj,tj+i\} of the form 

dRt = HtRt dt + dWu (20) 

where Ht is the Jacobian matrix with (i,j)th. element = dai{r]t)/drjj^f 

Assuming fixed or Gaussian initial conditions Rt ~ N{mt ,Vt ) gives Rt ~ N{mt,Vt), where 
nit and V) satisfy the ODE system 

^=H,V,+fl{r,„e,b) + V,H[. (22) 

In the absence of an analytic solution, the system of coupled ODEs (11) and (21)-(22) which 
characterise the LNA, must be solved numerically. For initial conditions rjt = xt , we have 
nit^ = 0 and Vt- = 0 so that (21) does not need to be solved, and the approximating transition 
distribution is Xt\Xt. = xt^ ~ A^(?/t, Vt)- 

It is worth noting here that the linear form of the SDE (20) satisfied by the approximate residual 
process coupled with the additive Gaussian observation regime admits a closed form expression for 
densities of the form suggesting use of the LNA as a proposal mechanism 

inside the Bayesian imputation approach of Section 3. Whilst the LNA could in principle be used 
to directly approximate the conditioned residual process governed by the SDE in (12) we note 
that the SDEs in (12) and (20) have different diffusion coefficients. Gonsequently, the probability 
law governing Rt is not absolutely continuous with respect to the law of Rt- We therefore do not 
advocate use of the LNA in this way. 

In the next section we outline an inference scheme for SDMEMs of the form (1) based on the 
LNA. It exploits the computational efficiency of a filtering algorithm proposed by Fearnhead et al. 
(2014) that allows closed-form calculation of the marginal likelihood vr(y|0, b, S) under our Gaussian 
observation regime (2); see the supplementary material for further details. 
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4.2 Application to SDMEMs 

Under the linear noise approximation of (1) the marginal posterior for all parameters is given by 

7r(6', V', S, b\y) oc 7r(6»)7r(V’)vr(S)7r(6|V’)7r(y|6', S, b) 

N 

OC 7r(0)7r(V’)'?r(S) 7r(fe*|V’)7r(y*|0, S, 6*). 

i=l 

This factorisation suggests a Gibbs sampler with blocking that sequentially takes draws from the 
full conditionals 7r(S|0, V', 6, ?/) = 7r(S|y), 7r{0\ip,J^,b,y) = TT{6\b,y),Tr{b\0,i/j,T,,y) = 7r{b\0,ip,y) 
and 7r{ijj\0,T,,b,y) = 7r{'ip\b). A Metropolis-Hastings step can be used when a full conditional 
density is intractable. An algorithm for computing the marginal likelihood 7r(y*|0, S, 6*) for each 
experimental unit is given in the supplementary material. Interest may also lie in the joint posterior 
Tr{0,Ip, T,,b,x\y) where, since no imputation is required for the LNA, x* = {xtQ,. ■ ■ and 

X = (x^,. .. ,x^). Realisations from this posterior can be obtained using the above Gibbs sampler 
with an extra step that draws from 7r(x*|0, V', S, 6*, y®) = 7r(x®|6*, S, 6®, y®) for i = An 

efficient mechanism for making such draws can also be found in the supplementary material. The 
method uses a forward filter, backward sampling (FFBS) algorithm. 

5 Applications 

We now compare the accuracy and efficiency of our Bayesian imputation approach (coupled with 
the modified innovation scheme) with an LNA-based solution. We consider two scenarios: one in 
which the ODEs governing the LNA are tractable and one in which numerical solvers are required. 
In the first we use synthetic data generated from a simple univariate SDE description of orange 
tree growth. The second example uses real data taken from Matis et al. (2008) to fit an SDMEM 
driven by the bivariate diffusion approximation of a stochastic kinetic model of aphid dynamics. 
The resulting SDMEM is particularly challenging to fit as both the drift and diffusion functions 
are nonlinear and also only one component of the model is observed (with error). We also include 
(in the supplementary material) a simulation study based on synthetic data generated from the 
model of aphid dynamics, to explore further any differences between the Bayesian imputation and 
LNA-based approaches. 

5.1 Orange tree growth 

The SDMEM developed by Picchini et al. (2010) and Picchini and Ditlevsen (2011) to model orange 
tree growth describes the dynamics of the circumference (XI) of individual trees (mm) by 

dXi = ^XiiP\-Xi)dt + aJj^dWI, W®=x^, i = l,...,N 

with 4>\ ~ N(4)i,a'^^) and 02 ~ X((j) 2 , independently. Here 0 = cr is common to all trees, the 
random effects are 6® = (0j, 02 )^) * ~ 1,..., A^ and the parameter vector governing the random 
effects distributions is 0 = (0i, 02, , fx^j)^. Note that the 0j can be interpreted as asymptotic 

circumferences and the (j)\ as the time-distance between the inflection point of the model obtained 
by ignoring stochasticity and the point where XI = 0j/(l + e~^). 

To allow identifiability of all model parameters we generated 16 observations for the circumfer¬ 
ence of A = 100 trees at intervals of 100 days. Following Picchini and Ditlevsen (2011) we gave each 
tree the same initial condition (xq = 30) and took (0i, 02, cr,^^, w) = (195,350,25,52.5,0.08), 
which gives random effects distributions 0j ~ A(195,25^) and (j)\ ~ A(350, 52.5^). For our anal¬ 
ysis of these data we assumed the parameters to be independent a priori with 0i and 02 having 
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weak A^(0,100^) priors and having weak gamma Ga(l,0.01) priors. In this 

example we assume there is no measurement error and therefore the target posterior is given by 


7r(0,-0, oc 7r(0)7r(V')7r(6|?/))7r(x|0, 6). 

In the Bayesian imputation approach, 7r(x|0,5) is as in (5) whereas for the LNA-based solution 

N n—1 

i=l j=0 


where, for each interval [tj,tj+i] and each tree i, the rjl and satisfy the ODE system 




dVi 


^ = + ^Z,=o. 


dt 




Fortunately this ODE system can be solved analytically giving r/l = A(f>\e^/^'^ j(1 + and 

EZ = B + 3At - + <i>Cj 


where A = — x\^) and B = j{1 + Ae*/'^2)4_ 

The MCMC scheme can make use of simple semi-conjugate updates for (/>!, (^ 2 ) and cr^j- 
However the remaining parameters {a and the B) require Metropolis-within-Gibbs updates and we 
have found that componentwise normal random walk updates (so-called random walk Metropolis) 
on the log scale work particularly well. Also, for the modified innovation scheme, the dynamics 
of the SDMEM permit the use of the modified diffusion bridge construct to update the latent 
trajectories between observation times; the improved bridge construct of Section 3.2 is not needed. 

The modified innovation scheme requires specification of the level of discretisation m. We 
performed several short pilot runs of the scheme with m G {5,10, 20, 40} and found no discernible 
difference in posterior output for m > 10. We therefore took m = 10. The sample output was also 
used to estimate the marginal posterior variances of a and the B, to provide sensible innovation 
variances in the random walk Metropolis updates. Both the modified innovation scheme and the 
LNA-based scheme required a burn in of 500 iterations, a thin of 100 iterates and were run long 
enough to yield a sample of approximately lOK independent posterior draws. Figure 3 shows the 
marginal posterior densities and autocorrelations for the common parameter a and the parameters 
governing the random effects distributions. The marginal posterior means and standard deviations 
of ((/>!, (/>2, cr0^, cj(^ 2 )'^) given in Table 1. The figures and table show that for these parameters 
both the imputation approach and LNA-based approach generally give similar output and are 
consistent with the true values from which the data were simulated. Similar results are obtained 
for the random effects parameters (see the supplementary material). 

Both schemes were coded in C and run on an Intel Xeon 3.0GHz processor; the modified 
innovation scheme took 43504 seconds to run whilst the LNA inference scheme took 2483 seconds. 
We use the minimum (over each parameter chain) effective sample size (minESS) to measure the 
statistical efficiency of each scheme. The modified innovation scheme produced a minESS of 7949 
and the LNA-based approach gave 7821. Therefore, in terms of minESS/sec, using the LNA 
outperforms the imputation approach in this example by a factor of approximately 17. It should 
be noted, however, that for most nonlinear SDMEMs the ODEs governing the LNA solution will 
rarely be tractable and the consequent use of numerical schemes will degrade its performance. 

In the next section we consider an example in which the LNA ODEs are intractable. 
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Figure 3: Marginal posterior densities for the random effects hyper-parameters ^ 2 , cr^^) 

and common parameter a in the orange tree growth SDMEM, together with their (overlayed) 
autocorrelation functions. Black: Bayesian imputation. Red: LNA. The vertical grey lines indicate 
the ground truth. 



4>i 

4>2 

*^01 

<7(^2 

a 

Imputation 

194.229 

344.799 

24.316 

53.219 

0.079 

(3.509) 

(10.098) 

(3.149) 

(10.410) 

(0.002) 

LNA 

194.634 

(4.025) 

347.631 

(10.844) 

24.207 

(3.154) 

53.960 

(10.193) 

0.079 

(0.002) 


Table 1: Marginal posterior means (standard deviations) of the random effects hyper-parameters 
{4>i, (/> 2 , cr^^) and common parameter a in the orange tree growth SDMEM. The synthetic data 

used (j)! = 195, 4>2 = 350, a^j)^ = 25, cr <^2 = 52.5 and a = 0.08. 


5.2 Cotton aphid dynamics 
5.2.1 Model and data 


Aphids (also known as plant lice or greenfly) are small sap sucking insects which live on the leaves 
of plants. As they suck the sap they also secrete honey-dew which forms a protective cover over 
the leaf, ultimately resulting in aphid starvation. Matis et al. (2006) describe a model for aphid 
dynamics in terms of population size [Nt) and cumulative population size {Ct). The model is a 
stochastic birth-death model with linear birth rate A At and death rate ^NtCt- The key probabilistic 
laws governing the time-evolution of the process over a small interval {t,t + dt\ are 

Pr(At+rft = nt + l, Ct+dt = ct + l\ nt,ct) = Xnt dt + o{dt), 

Pv{Nt+dt = nt - 1 , Ct+dt = Ct I nt, ct) = /untCt dt o{dt). 


The diffusion approximation of the Markov jump process dehned by (23) is 


(dNt 

\dCt 


/AAt - fiNtCA ^ /AAt + idNtCt 
V XNt J y AAt 


AAtA 

XNtJ 


1/2 


dWf 


(24) 
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Matis et al. (2008) also provide a dataset of cotton aphid counts collected from three blocks (1/2/3) 
and using treatments constructed from two factors: water irrigation (low/medium/high) and nitro¬ 
gen (blanket/variable/none). The data were collected in July 2004 in Lamesa, Texas and consist of 
five observations of aphid counts aggregated over twenty randomly chosen leaves in each plot for the 
twenty-seven treatment-block combinations. The data were recorded at times t = 0, 1.14, 2.29, 3.57 
and 4.57 weeks (approximately every 7/8 days). 

We now formulate an appropriate SDMEM model driven by (24) for these data and then fit the 
model. For notational simplicity, let i,j,k denote the level of water, nitrogen and block number 
respectively with i,j,k G {1,2,3}, where 1 represents low water/blanket nitrogen, 2 represents 
medium water/variable nitrogen and 3 represents high water/zero nitrogen. Let denote the 
number of aphids at time t for combination ijk and the corresponding cumulative population 
size. We write consider the SDMEM 


where 




i,j,k G {1,2,3}, 


a{xi^\ ^Lfc) 
/3(Xp,6*^'") 


yikN^Ok j ’ 


The fixed effects 6*-^^ = (A*-^^,have a standard structure which allows for main factor and 
block effects and single factor-block interactions, with 

= A -h Avv, -k Aat. -k Asfe + XwNij + >'WBik + Aats.^ 

^Ijk = n-\- -I- -g -I- + fJ'WBii, + MAfSjfc • 

Also for identifiability we use the corner constraints Xwi = Atv^ = A^i = 0, XwNij = XwNiji^ — 
^WBik = ^WBiki'^ - kiik) and Xnb^^ = XNBj^i'^ - i^jk), where = max(Jir, Jis) and Ji. is the 
Kronecker delta, with equivalent constraints on the death rates. The interpretation of (25) is 
straightforward. For example, A^^^ = A and = /r are the baseline birth and death rates 
inferred using all 5 X 3^ = 135 observations, and correspond to the treatment combination low 
water, blanket nitrogen and block 1. Likewise, all 5 x 3^ = 45 observations taken from block 2 
inform the main effects of block 2 {Xb 2 and relative to the baseline. 

A related approach can be found in Gillespie and Golightly (2010), where the diffusion approxi¬ 
mation is eschewed in favour of a further approximation via moment closure. Our approach further 
differs from theirs by allowing for measurement error and leads to a much improved predictive fit. 
The measurement error model is in part motivated by an over-dispersed Poisson error structure 
which we then approximate by a Gaussian distribution. Specifically, we assume that aphid pop¬ 
ulation size Nt is observed with Gaussian error and that the error variance is proportional to the 
latent aphid numbers, giving 


yr\Ni- 




ijk 


t = 0,1.14, 2.29, 3.57,4.57. 


(26) 


5.2.2 Implementation 

Our prior beliefs for l/fi^ are described by a Ga{a,a) distribution. We found little difference in re¬ 
sults for a G {0.01, 0.1,1} and so here we report results for a = 1. The prior for the elements in (25) 
consists of independent components subject to the birth and death rates for each treatment com¬ 
bination (A®-^*^, ^*-^^) being positive. The baseline rates A and ^ must be positive and so, following 
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Improved bridge construct 


Modified diffusion bridge 




Figure 4: 95% credible region (dashed line) and mean (solid line) of the true conditioned aphid 
population component 2 / 4,57 (red) and two competing bridge constructs (black). 


Gillespie and Golightly (2010), we assign weak t/(—10,10) priors to log A and log/r and also to the 
remaining parameters. We also take a fairly weak N (24, 90) prior for each and use a proposal 
of the form N{NtQ,(T‘^Nt(^) for updates. The cumulative population sizes must be at least as large as 
their equivalent population size. However, we do not expect them to be greatly different a priori. 
We investigated using a truncated distribution of the form CtolWo ~ Ctg > Wo as the 

prior and found that this led to little difference in posterior output for dc £ {Ij 10,100}. We have, 
therefore, chosen to fix in our analysis. Note that the form of the prior for a gives a 

semi-conjugate update. The remaining parameters in (25) are updated using random walk Metropo¬ 
lis on the pairwise X, fi component blocks (A,/r), {Xw 27 iJ‘W 2 )^ (-^VFs)/^Wa)) • • • i {^NBss, iJ-NBss)- 

The nonlinear form of the observation model (26) can be problematic for the modified innovation 
scheme. In particular, the proposal mechanism for the path update requires an observation model 
that is linear in W- Therefore, when proposing from the bridge construct in Section 3.2, we replace 
S in (14) and (15) with where rjtj+i = is the solution of (11). Since the 

proposal mechanism is corrected for via the Metropolis-Hastings step, no additional approximations 
to the target distribution are needed. 

In order to obtain a statistically efficient implementation of the modified innovation scheme, 
we investigate the performance of the modified diffusion bridge construct of Durham and Gallant 
(2002) and our improved bridge construct of Section 3.2 in a scenario typical of the real dataset. 
Using the simulation study of Gillespie and Golightly (2010), we take (A,/i)^ = (1.75,0.00095)^, 
xq = (28,28)^ and recursively apply the Euler-Maruyama approximation to give 
3 ^ 3.57 = (829.08,1406.07)^. We then compare the performance of each bridge construct over the 
final observation interval [3.57,4.57] by taking 2 / 4.57 as the median of (26) with cr = 1. Figure 4 
shows 95% credible regions of the true conditioned process iVi|x 3 . 57 , 2 / 4,57 (found via Monte Carlo 
simulation) with 95% credible regions obtained by repeatedly simulating from the modified diffusion 
bridge and our improved construct. It is clear that the modified diffusion bridge fails to adequately 
account for the nonlinear behaviour of the conditioned process. Use of each construct as a pro¬ 
posal mechanism inside a Metropolis-Hastings independence sampler (lOOK iterations) results in 
an acceptance rate of around 58% for the improved bridge construct and just 1% for the modified 
diffusion bridge. It is for these reasons that the modified diffusion bridge is eschewed in favour of 
our improved bridge construct when applying the Bayesian imputation approach. 

Finally, fitting the LNA requires the solution of an ODE system given by (II) and (22) where 
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Figure 5: Marginal posterior densities for a selection of the aphid model parameters. Black: 
Bayesian imputation. Red: LNA. 


the Jacobian matrix is 

^ [ X 0 )■ 

This ODE system is intractable and so our C implementation uses a standard ODE solver from 
the GNU scientific library, namely the explicit embedded Runge-Kutta-Fehlberg (4, 5) method. 
Note that the tractability of the marginal likelihood under the LNA requires a linear Gaussian 
observation model. Therefore, when applying the FFBS algorithm in the supplementary material, 
we make an approximation to the marginal likelihood calculation by replacing S with 

5.2.3 Results 

The time between observations is almost but not quite constant and so we have allowed each 
interval to have its own discretisation level, m. That said, the interval-specific values vary very 
little, and by at most two for the larger m values. Several short pilot runs of the modified innovation 
scheme were performed with typical m G {5,10, 20,40, 50}. These gave no discernible difference in 
posterior output for m > 20 and so we took m = 20. The sample output was also used to estimate 
the marginal posterior variances of the A, /U component blocks of the parameters in (25), to be used 
in the random walk Metropolis updates. Both the modified innovation scheme and MCMC scheme 
under the LNA were run for 40M iterations with the output thinned by taking every 4Kth iterate 
to give a final sample of size lOK. 
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Figure 5 shows the marginal posterior densities of the baseline parameters, the parameter a 
controlling the observation error variance and a selection of the remaining parameters. As in 
Gillespie and Golightly (2010) we find that block 2 plays an important role. The 95% credible 
regions for the main block 2 death rate, and \nb 22 ^ the birth rate characterising the interaction 
with nitrogen, are plausibly non-zero. Whilst the imputation approach and LNA generally give 
consistent output, there are some notable differences. For example, we find, in general, that the 
LNA tends to underestimate parameter values (and slightly exaggerates the confidence in these 
estimates) compared to those obtained under the modified innovation scheme. 

We also compared the predictive distributions obtained under each inferential model. The 
within-sample predictive distribution for the observation process {Ft,t = 0,... ,4.57} can be ob¬ 
tained by integrating over the posterior uncertainty of the latent process and parameter values 
in the observation model (26). Specifically, given samples I = 1 ,...,L} from the 

marginal posterior 7r{nl^^,a\y), the predictive density at time t can be estimated by 



i=i 


Likewise, for a new experiment repeated under the same conditions, the out-of-sample predictive 
distribution for the aphid population size can be determined for each treatment combination. This 
is estimated by averaging realisations of W (obtained by applying the Euler-Maruyama approxima¬ 
tion to (24)) over draws from the marginal posterior 7r(ng^^, obtained using either Bayesian 

imputation or the LNA. Figures 6 and 7 summarise these predictive distributions for a random 
selection of treatment combinations. Both the SDMEM and LNA give a satisfactory fit to the 
observed data, with all observations within or close to the central 50% of the distribution, and no 
observation outside the equi-tailed 95% credible intervals. As expected, the SDMEM gives a better 
fit over the LNA, although there is little difference between the two. There are however noticeable 
differences in the out-of-sample predictives, especially in the lower credible bound (in Eigure 7) sug¬ 
gesting that in some situations, using the inferences made under the LNA to predict the outcome of 
future experiments can give misleading results. These differences lead us to examine the marginal 
posterior densities of the treatment-block specific birth and death rates, and over whose 
uncertainty we average. Samples from these posteriors are straightforward to obtain, using the 
posterior samples of the constituent parameters in (25). Eigure 8 shows marginal posterior densi¬ 
ties of the overall birth rates (A*-^^) associated with the six treatment-block combinations for which 
predictives are presented in Figure 7. We see distinct differences between posteriors obtained under 
the Bayesian imputation approach and the LNA approach. The posteriors displayed are indicative 
of those obtained for all treatment combinations. Moreover, similar patterns are evident in the 
overall death rates 

We obtained a minESS of 1039 under the modified innovation scheme. The LNA, however, 
clearly benefits from analytically integrating out the latent process and gave a minESS of 8908. 
Eor this example, we found that significant gains in computational efficiency were possible by 
performing the parameter updates and, for the modified innovation scheme, the path updates, 
in parallel. Eor example, updating A^j and IJ.B 2 involves calculating a product of likelihoods (or 
marginal likelihoods for the LNA) over all 3^ = 9 treatment combinations that include block 2. 
These constituent likelihoods can be calculated in parallel. Similarly, for the modified innovation 
scheme, the treatment specific path updates can be performed in parallel. Both the modified inno¬ 
vation scheme and the LNA-based scheme were again coded in C and run on a high performance 
computing cluster with 14 cores (made up of Intel Xeon 3.0GHz processors). The modified inno¬ 
vation scheme took approximately 18 days to run whereas the LNA-based scheme required only 
approximately 4.3 days. Note that here the speed advantage of the LNA-based scheme has reduced, 
now being roughly 4 times faster than the modified innovation scheme, whereas in Section 5.1, the 
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Figure 6: Within sample predictive distributions for Bayesian imputation (top 2 rows) and LNA 
(bottom 2 rows). The red crosses indicate the observed values. 
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Figure 7: Out-of-sample predictive intervals for the aphid population size against time for 

a random selection of treatment combinations. The mean is depicted by the solid line with the 
dashed representing a 95% credible region. Black: Bayesian imputation. Red: LNA. 








Figure 8: Marginal posterior densities for a random selection of the birth rates associated with 
specihc treatment combinations in the aphid model. Black: Bayesian imputation. Red: LNA. 
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LNA was approximately 20 times faster. The intractability of the ODEs driving the LNA clearly 
plays a significant role in computational efficiency. In terms of overall efficiency (as measured by 
minESS/sec) the LNA~based scheme outperforms the Bayesian imputation approach by a factor 
of around 36. These computational advantages of the LNA must be weighted against the inaccu¬ 
racies of the resulting posterior and predictive distributions, inaccuracies which can at times be 
substantial, as demonstrated by the simulation study in the supplementary material. 

6 Discussion 

We have provided a framework that permits (simulation-based) Bayesian inference for a large class 
of multivariate SDMEMs using discrete-time observations that may be incomplete and subject 
to measurement error. By adopting a Bayesian imputation approach, we have shown how the 
modified innovation scheme of Golightly and Wilkinson (2008), which is necessary for overcoming 
the problematic dependence between the latent process and any parameters that enter the diffusion 
coefficient, can be applied to SDMEMs. Fundamental to our approach is the development of a novel 
bridge construct that can be used to sample a discretisation of a conditioned diffusion process, and 
does not break down when the process exhibits strong nonlinearity over inter-observation times of 
interest. The computational cost of the Bayesian imputation scheme is dictated by the number of 
imputed points (characterised by m) between observation times. In the examples considered here 
we see little difference in posterior output under the Bayesian imputation scheme for m > 20. 

We also considered a tractable approximation to the SDMEM, the linear noise approximation, 
and provided a systematic comparison using two applications. The computational efficiency of the 
LNA depends on the dimension of the SDE driving the SDMEM. For a d-dimensional SDE system, 
the LNA requires the solution of a system of order coupled ODEs. In our first application, 
the resulting ODE system can be solved analytically, leading to increases in both computational 
and overall efficiency (as measured by minimum ESS per second) of around a factor of 20. More¬ 
over, we found little difference in the accuracy of inferences made under the LNA and imputation 
approaches. In our second application, we fitted the diffusion approximation of a Markov jump 
process description of aphid dynamics using data from Matis et al. (2008). In this example, the 
ODE system governing the LNA is intractable and the computational advantage of using the LNA 
over an imputation approach reduced to around a factor of 4. However, the benefit of using the 
LNA to analytically integrate over the latent process is clear, giving an overall increase in efficiency 
of around a factor of 36. It is important to note that whilst the LNA is preferred in terms of overall 
efficiency for the examples considered here, as the dimension d of the SDE is increased, the LNA is 
likely to become infeasible. Moreover, whilst both the imputation and LNA approaches provided a 
reasonable fit to the aphid data, differences were found between the parameter posteriors, leading 
to differences in the out-of-sample predictive distributions. A simulation study (given in the sup¬ 
plementary material) highlighted further differences between the LNA and Bayesian imputation 
approaches. Care must therefore be taken in trying to fit the SDMEM by using an LNA-based 
inference approach. 
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A LNA FFBS algorithm 


To ease the notation, consider a single experimental unit and drop i from the notation. Since the 
parameters 6, ijj, b and S remain fixed throughout this section, we also drop them from the notation 
where possible. Define yo-j = {ytQ,... ■ Now suppose that Xq ~ N{a,C) a priori. The 

marginal likelihood under the LNA, 7r(y\6,'E,b), can be obtained from the forward Liter described 
below. After execution of the forward Liter, realisations of '7r{x\y, 6, S, b) can be generated using a 
backward sampler. Note that the backward sweep requires 

Cov{Xtj^^,Xtj) = Cov{Rt^^^,Rt^) = Cov{Pt^^^Rtj,Rtj) = Pti+^Var{Rt.). 


Here Pt is a d x d matrix that can be shown to satisfy the ODE 


dt 


= HtPi 




with initial condition Pq = 1^, the d x d identity matrix. 


(A.l) 


1. Forward Liter. Initialisation. Compute 'n'iyto) = d^ivto ; , P'^CF + S). The posterior at 

time to = 0 is therefore XtQ\ytQ ~ X{ao, Go), where 

ao = a + CF{F^CF + ^^Hyto - F^a) 

Go = G - CPiP^CF + . 


Store the values of ao and Go. 


2. For j = 0, 1,..., n — 1, 

(a) Prior at p-i-i. Initialise the LNA with rjtj = at^, Vtj = Ct^ and Pt^ = Id- Integrate 

the ODEs (11), (22) and (A.l) forward to p+i to obtain rjtj+i, ^L+i Hence 

Xtj+r\yo:j+i ~ X{rit 

j+i’ 

(b) One step forecast. Using the observation equation, we have that 

Rq+i \yo, ~ + S). 

Compute the updated marginal likelihood 

7r(yo:j+i) = '^{yo:j)T^{ytj+Ayo-.j) = '^{yo:j)N{ytj+^ ; F'^Vt,+i , P'^Vtj^.F + S). 

‘email: andrew.golightlyOncl .ac.uk 
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(c) Posterior at tj+i- Combining the distributions in (a) and (b) gives the joint distribution 
of and (conditional on yoij) as 


Yt. 










L %+i/ V-^ ''ij+i ^ 

and therefore Xt-j^^\yo,j+i ~ ^^(otj+i,Cj^+i), where 

at,+i = Vtj+i + Vt^+,F{F^Vtj^^F + 

Store the values of , -qt^+i , Vj^+i and . 

We sample 7r(x|y) using a backward sampler. The algorithm is as follows. 

1. Backward sampler. First draw xt„ from Xt„\y 

2. For j = re — 1, n — 2,..., 0, 

(a) Joint distribution of Xt^ and Note that Xtj\yo-.j ~ N{atj,Ctj). The joint distri¬ 

bution of Xt- and Xt-j^^ (conditional on y^-j) is 






N 


at, 


Cu 


tj+ij L yVij+iJ \Ptj+iCtj 14. 




•j+i 


(b) Backward distribution. The distribution of XtJ\Xt^_^_,,yo■.j is N{dtj,Ct.), where 


at, = at, + a - Vt,+^), 

C^.=Ct .- Ct PT Vr^ Pt Ct -. 

re tj+i tj+i tj+i 

Draw Xt, from Xt,\Xt.^,,yQ,j ~ N{dt,,Ctj). 

B Additional graphics from the orange tree growth example 

Figure 1 shows the marginal posterior densities of five randomly chosen random effects, based on 
synthetic data generated from the SDMEM of orange tree growth given in Section 5.1 of the main 
article. 


C Simulation study 


In this section, we investigate further differences between the Bayesian imputation approach and 
an inference scheme based on the LNA using synthetic data generated from (24) in the main paper. 
For simplicity, we consider a fixed treatment (low water, blanket nitrogen) and three blocks. We 
therefore write X^^^ = and consider the SDMEM 


where 






{1,2,3}, 
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Figure 1: Marginal posterior densities for a random selection of (p\ (top 2 rows) and 4>\ (bottom 2 
rows) in the orange tree growth SDMEM, together with their (overlayed) autocorrelation functions. 
Black: Bayesian imputation. Red: LNA. The vertical grey lines indicate the ground truth. 
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The fixed effects have a standard structure to incorporate block effects, with 

= A + Abj^ and = Ai + , 


where we again impose the corner constraints Ab^ = = 0 to allow for identifiability. 

To mimic the real dataset, we took A = 1.75, // = 0.00095, Ab 2 = —0.1154, Abs = —0.0225, 
^B 2 = —0.0004 and //b 3 = 0.0002. For each block, we generated five observations (on a regular grid) 
by using the Euler-Maruyama approximation with a small time-step (At = 0.001) and an initial 
condition of xq = (5,5)^. To assess the impact of measurement error on the quality of inferences 
that can be made about each parameter, we corrupted our data via the observation model 




indep 

rsj 




t = 0,l,2,3,4, 


and took a G {0,0.5,1,5} to give four synthetic datasets. We adopt the same prior specification 
for the unknown parameters as used in the real data application. 

Both the modified innovation scheme (again incorporating the improved bridge construct) and 
the LNA-based inference scheme were run long enough to yield a sample of approximately lOK 
independent posterior draws. For the former, we fixed the discretisation level by taking m = 20 
and note that m > 20 gave little difference in posterior output. Figure 2 shows the marginal 
posterior densities of the baseline parameters (A and ^) and the measurement error variance (a). 
The joint posterior densities of (/x, A)^ are shown in Figure 3. It is clear that when fitting the 
SDMEM using the Bayesian imputation approach, the posterior samples obtained are consistent 
with the ground truth. This is true to a lesser extent when using the LNA, with the ground truth 
found in the tail of the posterior distribution in three out of the four scenarios. In fact, when using 
synthetic data with cr < 5, we see substantive differences in posterior output. As was observed 
when using real data, the LNA underestimates parameter values compared to those obtained under 
the Bayesian imputation scheme. In this case, the LNA provides a relatively poor approximation 
to the true posterior distribution. 

Increasing cr to 5 (and beyond) gives output from both schemes which is largely in agreement. 
This is intuitively reasonable, since, as the variance of the measurement process is increased, the 
ability of both inference schemes to accurately infer the underlying dynamics is diminished. Essen¬ 
tially, the relative difference between the LNA and SDE is reduced. 
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Figure 2: Marginal posterior densities for the baseline parameters and the parameter a controlling 
the observation error variance in the aphid simulation study, cr = 0 (1®* row), a = 0.5 (2°^ row), 
cj = 1 (3'’'^ row), (T = 5 (4*^ row). Black: Bayesian imputation. Red: LNA. The vertical grey lines 
indicate the ground truth. 
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Figure 3: Bivariate marginal posterior densities for the baseline parameters in the aphid simulation 
study Black: Bayesian imputation. Red: LNA. The blue cross indicates the ground truth. 


30 










